41 research outputs found

    Digital Image Access & Retrieval

    Get PDF
    The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio

    The Astrolabe Project: Identifying and Curating Astronomical Dark Data through Development of Cyberinfrastructure Resources

    Full text link
    As research datasets and analyses grow in complexity, data that could be valuable to other researchers and to support the integrity of published work remain uncurated across disciplines. These data are especially concentrated in the Long Tail of funded research, where curation resources and related expertise are often inaccessible. In the domain of astronomy, it is undisputed that uncurated dark data exist, but the scope of the problem remains uncertain. The Astrolabe Project is a collaboration between University of Arizona researchers, the CyVerse cyberinfrastructure environment, and American Astronomical Society, with a mission to identify and ingest previously-uncurated astronomical data, and to provide a robust computational environment for analysis and sharing of data, as well as services for authors wishing to deposit data associated with publications. Following expert feedback obtained through two workshops held in 2015 and 2016, Astrolabe is funded in part by National Science Foundation. The system is being actively developed within CyVerse, and Astrolabe collaborators are soliciting heterogeneous datasets and potential users for the prototype system. Astrolabe team members are currently working to characterize the properties of uncurated astronomical data, and to develop automated methods for locating potentially-useful data to be targeted for ingest into Astrolabe, while cultivating a user community for the new data management system.Comment: To be published in Proceedings of Library and Information Services in Astronomy (LISA) VIII; conference held in Strasbourg, France, June 6-9, 201

    Astrolabe: Curating, Linking and Computing Astronomy's Dark Data

    Full text link
    Where appropriate repositories are not available to support all relevant astronomical data products, data can fall into darkness: unseen and unavailable for future reference and re-use. Some data in this category are legacy or old data, but newer datasets are also often uncurated and could remain "dark". This paper provides a description of the design motivation and development of Astrolabe, a cyberinfrastructure project that addresses a set of community recommendations for locating and ensuring the long-term curation of dark or otherwise at-risk data and integrated computing. This paper also describes the outcomes of the series of community workshops that informed creation of Astrolabe. According to participants in these workshops, much astronomical dark data currently exist that are not curated elsewhere, as well as software that can only be executed by a few individuals and therefore becomes unusable because of changes in computing platforms. Astronomical research questions and challenges would be better addressed with integrated data and computational resources that fall outside the scope of existing observatory and space mission projects. As a solution, the design of the Astrolabe system is aimed at developing new resources for management of astronomical data. The project is based in CyVerse cyberinfrastructure technology and is a collaboration between the University of Arizona and the American Astronomical Society. Overall the project aims to support open access to research data by leveraging existing cyberinfrastructure resources and promoting scientific discovery by making potentially-useful data in a computable format broadly available to the astronomical community.Comment: Accepted for publication in the Astrophysical Journal Supplement Series, 22 pages, 2 figure

    Augmenting optical character recognition (OCR) for improved digitization: Strategies to access scientific data in natural history collections

    Get PDF
    The Augmenting OCR Working Group (A-OCR WG) at Integrated Digitized Biocollections (iDigBio) seeks to improve community OCR strategies and algorithms for faster, better parsing of OCR output derived from valuable data on natural history collection specimen labels. This task is exceedingly difficult because museum labels are often annotated, and vary in content, form and font. Under the National Science Foundation's (NSF) Advancing Digitization of Biological Collections (ADBC) program, iDigBio is building a cyberinfrastructure to aggregate quality data from museum specimens housed in collections across the United States for use by researchers, educators, environmentalists and the public. Since March of 2012, the A-OCR WG formed from community consensus to begin its role in this endeavor, defining reachable goals including setting up a hackathon concurrent with iConference 2013. This paper reports on the definition of some key problems identified by the A-OCR WG since these science problems will drive research and cyberinfrastructure development.published or submitted for publicationis peer reviewe

    Datasphere at the Biosphere II: Computation and data in the wild

    Get PDF
    Biological Field Stations provide a unique set of opportunities and challenges for digital curation. The stations serve as the center of short-term and long-term biological research, from biomolecular-scale to ecosystems-scale research. They represent some of the last remaining “natural” areas in certain regions. Stations provide unique information about local biotic and abiotic conditions. Data shared among the stations support continental scale and global research initiatives. The stations themselves support a large number of researchers who often come from multiple universities and other research and teaching institutions around the world. Because of this decentralized user base, it is particularly difficult for stations to capture data and other research products generated by research at the stations. The authors, part of a larger NSF funded “Empowering Long Tail Research” project (NSF:#1216872), conducted a survey of field station researchers and then held a two-day workshop to identify challenges and opportunities for “grand challenge” research questions that could be enabled through development of cyberinfrastructure. The information gathered through this study will inform future proposals for cyberinfrastructure development.ye

    Graduate Curriculum for Biological Information Specialists: A Key to Integration of Scale in Biology

    Get PDF
    Scientific data problems do not stand in isolation. They are part of a larger set of challenges associated with the escalation of scientific information and changes in scholarly communication in the digital environment. Biologists in particular are generating enormous sets of data at a high rate, and new discoveries in the biological sciences will increasingly depend on the integration of data across multiple scales. This work will require new kinds of information expertise in key areas. To build this professional capacity we have developed two complementary educational programs: a Biological Information Specialist (BIS) masters degree and a concentration in Data Curation (DC). We believe that BISs will be central in the development of cyberinfrastructure and information services needed to facilitate interdisciplinary and multi-scale science. Here we present three sample cases from our current research projects to illustrate areas in which we expect information specialists to make important contributions to biological research practice

    O Serviço de documentação textual e iconografia do Museu Paulista

    Get PDF
    The essay compares the curatorship's works realized during the decade of 1990 by the actual Department of Textual and Iconographical Documentation of Museu Paulista, responsible for the MP Fund / Permanent File (Fundo MP/Arquivo Permanente), hundreds of collections and textual funds and 50.000 iconography pieces, great part of which are gathered in photographic collections. It shows how the documentation work extrapolates the limits of SVDHICO in order to integrate itself with the group activities of the museum and with other research groups. It also points towards new work methodologies which allow to perform the curatorship in an integrated way with the interdisciplinary research and the culture diffusion.O artigo faz um balanço dos trabalhos de curadoria realizados durante a década de 1990 pelo atual Serviço de Documentação Textual e Iconografia do Museu Paulista, responsável pelo Fundo MP/Arquivo Permanente, centenas de coleções e fundos textuais e 50.000 peças de iconografia, grande parte delas reunidas em coleções fotográficas. Mostra como o trabalho de documentação extrapola os limites do SVDHICO para integrar-se com as atividades de conjunto do Museu e com outros grupos de pesquisa. Aponta também para novas metodologias de trabalho com imagens que permitem realizar a curadoria de forma integrada à pesquisa interdisciplinar e à difusão cultural

    Shedding Light on the Dark Data in the Long Tail of Science

    Get PDF
    One of the primary outputs of the scientific enterprise is data, but many institutions such as libraries that are charged with preserving and disseminating scholarly output have largely ignored this form of documentation of scholarly activity. This paper focuses on a particularly troublesome class of data, termed ???dark data???. ???Dark data??? is not carefully indexed and stored so becomes nearly invisible to scientists and other potential users and therefore is more likely to remain underutilized and eventually lost. The article discusses how the concepts from long tail economics can be used to understand potential solutions for better curation of this data. The paper describes why this data is critical to scientific progress, some of the properties of this data, as well as some social and technical barriers to proper management of this class of data. Many potentially useful institutional, social and technical solutions are under development and are introduced in the last sections of the paper, but these solutions are largely unproven and require additional research and development.published or submitted for publicationnot peer reviewe
    corecore